150 research outputs found
A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor
In this paper, a siamese DNN model is proposed to learn the characteristics
of the audio dynamic range compressor (DRC). This facilitates an intelligent
control system that uses audio examples to configure the DRC, a widely used
non-linear audio signal conditioning technique in the areas of music
production, speech communication and broadcasting. Several alternative siamese
DNN architectures are proposed to learn feature embeddings that can
characterise subtle effects due to dynamic range compression. These models are
compared with each other as well as handcrafted features proposed in previous
work. The evaluation of the relations between the hyperparameters of DNN and
DRC parameters are also provided. The best model is able to produce a universal
feature embedding that is capable of predicting multiple DRC parameters
simultaneously, which is a significant improvement from our previous research.
The feature embedding shows better performance than handcrafted audio features
when predicting DRC parameters for both mono-instrument audio loops and
polyphonic music pieces.Comment: 8 pages, accepted in IJCNN 201
Testing the Consistency of Performance Scores Reported for Binary Classification Problems
Binary classification is a fundamental task in machine learning, with
applications spanning various scientific domains. Whether scientists are
conducting fundamental research or refining practical applications, they
typically assess and rank classification techniques based on performance
metrics such as accuracy, sensitivity, and specificity. However, reported
performance scores may not always serve as a reliable basis for research
ranking. This can be attributed to undisclosed or unconventional practices
related to cross-validation, typographical errors, and other factors. In a
given experimental setup, with a specific number of positive and negative test
items, most performance scores can assume specific, interrelated values. In
this paper, we introduce numerical techniques to assess the consistency of
reported performance scores and the assumed experimental setup. Importantly,
the proposed approach does not rely on statistical inference but uses numerical
methods to identify inconsistencies with certainty. Through three different
applications related to medicine, we demonstrate how the proposed techniques
can effectively detect inconsistencies, thereby safeguarding the integrity of
research fields. To benefit the scientific community, we have made the
consistency tests available in an open-source Python package
Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for
singing voice synthesis (SVS) that exploits the physical characteristics of the
human voice using differentiable digital signal processing. GOLF employs a
glottal model as the harmonic source and IIR filters to simulate the vocal
tract, resulting in an interpretable and efficient approach. We show it is
competitive with state-of-the-art singing voice vocoders, requiring fewer
synthesis parameters and less memory to train, and runs an order of magnitude
faster for inference. Additionally, we demonstrate that GOLF can model the
phase components of the human voice, which has immense potential for rendering
and analysing singing voices in a differentiable manner. Our results highlight
the effectiveness of incorporating the physical properties of the human voice
mechanism into SVS and underscore the advantages of signal-processing-based
approaches, which offer greater interpretability and efficiency in synthesis.
Audio samples are available at https://yoyololicon.github.io/golf-demo/.Comment: 9 pages, 4 figures. Accepted at ISMIR 202
- …